Method, apparatus and software for lossy data compression and function estimation

ABSTRACT

There is provided a method of producing compressed data including the steps of receiving input data to be compressed (step  10 ), transforming said input data into a sum of basis functions multiplied by a corresponding set of coefficients, which may be a transformation to the frequency domain (step  12 ), forming a function or row of data from said set of coefficients (step  17 ), estimating the function using a learning process (step  21 ) and recording the resulting estimate (step  28 ) as the compressed data. Also provided is a method of estimating a function which may be used in the method of producing compressed data including inverting the function about a predetermined value (step  18 ) prior to using a learning process to estimate the function. Apparatus ( 1 ) for implementing the method and a software product to cause a processing means to implement the method are also claimed.

TECHNICAL FIELD

The present invention relates to methods, apparatus and software for usein lossy data compression and to function estimation. In particular, butnot exclusively, the present invention relates to still and moving imagecompression and/or sound and audio data compression using learningprocesses.

BACKGROUND ART

The use of data compression has become increasingly important forinformation communication and storage given the now massive amounts ofinformation communicated and stored electronically world-wide. By moreeffectively compressing data, bandwidth in communication channels and/ormemory space in storage systems can be freed, allowing increasedcommunication and/or storage capacity respectively.

The ISO/IEC 10918-1 standard, commonly referred to as JPEG has becomeprobably the most popular form of image compression. However, JPEGbecomes increasingly less effective for lossy compression at ratiosexceeding approximately 30:1.

A compression method that results in improved quality of imagerestoration over JPEG compression for compression ratios exceeding 30:1has been presented in C. Amerijckx et al. “Image compression byself-organized kohonen map”, IEEE Transactions on Neural Networks, vol.9, no. 3, May 1998. This method involves transforming the image datausing the discrete cosine transform (DCT), vector quantizing the DCTcoefficients by a topological self-organising map (Kohonen map),differential coding by a first order predictor and entropic encoding ofthe differences.

In the specification of U.S. Pat. No. 5,950,146, the use of supportvectors in combination with a neural network is described to assist inovercoming the problem of exponential increases in processingrequirements with linear dimensional increases in estimations offunctions. The method involves providing a predetermined errortolerance, which is the maximum amount that an estimate functionprovided by a neural network may deviate from the actual data.

The advantage of using support vector machines to alleviate the problemof disproportionate computational increases for dimensional increasesfor image compression was identified in: J. Robinson and V. Kecman, “Theuse of support vectors in image compression”, Proceedings of the 2ndInternational Conference on Engineering Intelligent Systems, Universityof Paisley, June 2000. The use of support vector machines goes some wayto providing improved image quality relative to that obtained from JPEGcompression for compression ratios greater than 30:1.

Neural networks have also been used for image compression; see forexample the specification of U.S. Pat. No. 5,005,206. After the image isdefined by a suitable function, which is typically a transformation ofthe image data into a different domain, a neural network is trained onthe function to produce an estimate of the function. The image isreconstructed from the weights of the neural network.

A problem with neural networks is that for highly varying data, theerror of the resulting estimate may be substantial. To reduce the error,a large number of points may be considered. However, this increases thecomputational burden and decreases compression. Even using supportvectors, the computational burden to enable a solution to be computedwithin a required error may be prohibitively high.

Thus, it is an object of the present invention to provide a method,apparatus and/or software product that provide improved accuracy in datacompression and/or in function estimation over existing methods,apparatus and software for comparative processing effort, or at least toprovide the public with a useful alternative.

Further objects of the present invention may become apparent from thefollowing description.

Any discussion of the prior art throughout the specification should inno way be considered as an admission that such prior art is widely knownor forms part of common general knowledge in the field.

SUMMARY OF THE INVENTION

According to a first aspect of the invention, there is provided a methodof producing compressed data including the steps of:

-   a) receiving input data to be compressed;-   b) transforming said input data into a sum of basis functions    multiplied by a corresponding set of coefficients;-   c) forming a function from said set of coefficients;-   d) estimating said function using a learning process and recording    the resulting estimate;    wherein said estimate defines the compressed data.

Preferably, the method may further include subdividing input datareceived in step a) into a number of portions and performing steps b)through d) on each of said portions.

Preferably, the step of forming a function from said set of coefficientsmay include first removing a DC component from each said portion andrecording the DC components as part of the compressed data.

Preferably, the method may further include encoding said DC componentsprior to recording them as part of the compressed data.

Preferably, the step of forming a function may include:

-   i) defining a first function as a series of said coefficients    alternating about a predetermined value;-   ii) creating a second function by inverting identified portions of    the first function on one side of the predetermined value and    recording inversion information defining the portions that have been    inverted;    wherein said second function defines the function to be estimated in    step d).

Preferably, the learning process identifies a series of basis vectorsand wherein the method may further include associating each basis vectorwith a corresponding coefficient in said function.

Preferably, the method may further include discarding the inversioninformation for any coefficient that does not have an associated basisvector.

Preferably, the method may further include discarding weights calculatedby the learning process that do not have an associated basis vector.

Preferably, the method may further include discarding the inversioninformation for any coefficient where one or both of its absolute valueand reconstructed is less than a predetermined magnitude.

Preferably, the method may including storing the basis vectors as a dataseries, wherein the position of a value in the data series defines thebasis vector and the value defines the basis vector's weight.

Preferably, the method may include shifting the values in the dataseries so that all the values that have not been discarded have the samesign and are non-zero, recording the value by which the values have beenshifted as part of the compressed data and combining the basis vectorswith the inversion information by changing the sign of the valuesdependent on the inversion information.

Preferably, the method may include encoding the data series usingentropy encoding.

Preferably, the step of transforming said input data into a sum of basisfunctions multiplied by a corresponding set of coefficients may includeapplying the discrete cosine transform to the input data.

Preferably, the input data may define a two-dimensional image.

According to a second aspect of the invention there is provided a methodof creating information defining an estimate of a function, the methodincluding:

-   i) receiving input information including a number of values defining    a first function, wherein the first function is to be estimated;-   ii) creating a second function by inverting identified portions of    the first function on one side of a predetermined value and    recording inversion information defining the portions that have been    inverted;-   iii) estimating said second function using a learning process and    recording information defining a resulting estimate;    wherein the resulting estimate and inversion information are    collectively the information defining an estimate of the first    function.

Preferably, the learning process may compute a series of basis vectorsand the method includes associating each basis vector with acorresponding value in said second function.

Preferably, the method may further include discarding the inversioninformation for any value that does not have an associated basis vector.

Preferably, the method may further include discarding weights computedby the learning machine training that do not have an associated basisvector.

Preferably, the method may further include discarding the inversioninformation for any coefficient where one or both of its absolute valueand reconstructed is less than a predetermined magnitude.

Preferably, the method may include storing the basis vectors as a dataseries, wherein the position of a value in the data series defines thebasis vector and the value defines the basis vector's weight.

Preferably, the method may include shifting the values in the dataseries so that all the values that have not been discarded have the samesign and are non-zero, recording the value by which the values have beenshifted as part of the information defining an estimate of the firstfunction and combining the basis vectors with the inversion informationby changing the sign of the values dependent on the inversioninformation.

Preferably, the method may include encoding the data series usingentropy encoding.

According to a third aspect of the present invention, there is providedcomputerised apparatus for producing compressed data, the computerisedapparatus including receiving means to receive input data to becompressed and processing means operable to:

-   a) transform said input data into a sum of basis functions    multiplied by a corresponding set of coefficients;-   b) form a function from said set of coefficients;-   c) estimate said function using a learning process;    wherein said estimate defines the compressed data.

Preferably, the processing means may be operable to form a function by:

-   i) defining a first function as a series of said coefficients    alternating about a predetermined value;-   ii) creating a second function by inverting identified portions of    the first function on one side of the predetermined value and    recording inversion information defining the portions that have been    inverted;    wherein said second function defines the function to be estimated in    step c).

Preferably, the computerised apparatus may be used to compress image orsound information, wherein the processing means is operable to transformimage information from the spatial domain to the frequency domain tocreate said set of coefficients.

According to a fourth aspect of the present invention, there is providedcomputerised apparatus for creating information defining an estimate ofa function, the computerised apparatus including receiving means forreceiving input information including a number of values defining afirst function, wherein the first function is to be estimated andprocessing means operable to:

-   i) create a second function by inverting identified portions of the    first function on one side of a predetermined value and recording    inversion information defining the portions that have been inverted;-   ii) estimate said second function using a learning process and    recording information defining a resulting estimate;    wherein the resulting estimate and inversion information are    collectively the information defining an estimate of the first    function.

According to a fifth aspect of the present invention, there is provideda software product for producing compressed data, the software productincluding an instruction set including instructions to cause aprocessing means or a combination of processing means to:

-   a) transform said input data into a sum of basis functions    multiplied by a corresponding set of coefficients;-   b) form a function from said set of coefficients;-   c) estimate said function using a learning process;    wherein the estimate defines the compressed data.

According to a sixth aspect of the present invention, there is provideda software product for determining an estimate of a function, thesoftware product including an instruction set including instructions tocause a processing means or a combination of processing means to:

-   i) receive information defining a first function, wherein the first    function is to be estimated;-   ii) create a second function by inverting identified portions of the    first function on one side of a predetermined value and recording    inversion information defining the portions that have been inverted;-   iii) estimate said second function using a set of parameters than    number less than the number of values in said input information and    recording information defining the resulting estimate;    wherein the resulting estimate and inversion information are    collectively the information defining an estimate of the first    function.

According to a seventh aspect of the present invention, there isprovided a computer readable medium containing the software product asdescribed in either of the two immediately preceding paragraphs.

Further aspects of the present invention, which should be considered inall its novel aspects, may become apparent from the followingdescription, given by way of example only and with reference to theaccompanying drawings.

BRIEF DESCRIPTION OF DRAWINGS

FIG. 1: Shows a block diagram representation of a computationalenvironment in which the present invention may be implemented;

FIG. 2: Shows a flow diagram of the steps to compress an image inaccordance with one embodiment of the present invention;

FIG. 3: Shows a matrix defining an 8×8 block of image data in thespatial domain and a matrix defining the same block in the frequencydomain;

FIG. 4: Shows a zig-zag pattern through an 8×8 block used to form afunction in accordance with an aspect of the present invention;

FIG. 5: Shows a plot of the function formed by following the zig-zagpattern of FIG. 4 along matrix T in FIG. 3.

FIG. 6: Shows a plot of the absolute value of the function in FIG. 5.

FIG. 7: Shows the reconstructed coefficients resulting from learningmachine training applied to the function shown in FIG. 6.

FIG. 8: Shows the error between the input function and its estimateresulting from data compressed using the results of learning machinetraining shown in FIG. 7.

FIG. 9: Shows the results of learning machine training applied to thefunction shown in FIG. 5.

FIG. 10: Shows the error between the input function and its estimateresulting from data compressed using the results of learning machinetraining shown in FIG. 9.

FIG. 11: Shows a graphical representation of the variables used in alearning machine training process.

FIG. 12: Shows a plot of compression ratio versus signal-to-nose ratiofor the compression method of the present invention in comparison toJPEG compression.

FIG. 13: Shows examples of reconstructed images from compressed imagedata formed by the method of the present invention and by using the JPEGcompression algorithm.

MODES FOR CARRYING OUT THE INVENTION

The present invention provides a method of data compression and a methodof function estimation, which may be used in the method of datacompression. The invention may have particular application to thecompression of two-dimensional black-white or colour still or movingimages and the following description is given by way of example withparticular reference to this application. However, those skilled in therelevant arts will recognise that the invention may have alternativeapplications in the field of statistical learning. Such alternativeapplications may include compression of audio data files, patternrecognition, modelling of complex systems, search algorithms, datafiltering and decision making systems.

In the application of two dimensional image compression, the inventioninvolves first transforming the image from the spatial domain into a sumof basis functions multiplied by a corresponding set of coefficients.The component weights or coefficients of the transformed data are thenexpressed as a function (coefficient function) and a learning process isapplied to the coefficient function. In one embodiment, the methodincludes the aspect of the invention used for function estimation, whichincludes inverting portions of the coefficient function in order toresult in a second function that is more accurately modelled by alearning process for the same number of model parameters. A record iskept of the portions of the coefficient function that are inverted andthis record is used during reconstruction of the image.

FIG. 1 shows a block diagram representative of a possible computationsystem 1 that may be used to implement the compression method of thepresent invention. The computation system 1 includes an input/output(I/O) interface 2 to receive image data, which may be a singlecommunication channel or multiple communication channels. A processor 3,including a learning machine 4 is in communication with the I/Ointerface 2. The processor 3 stores the image data in memory 5 andperforms the processing steps as described herein below according toalgorithms stored in program memory 6. The learning machine 4 performsthe learning process and outputs the results to memory, for example tomemory 5 and/or outputs the results to an output, for example I/Ointerface 2. Memory 5 and program memory 6 may be any suitable computerreadable medium, readable by the processor 3. Also, the processor 3 maybe any suitable processing means including microprocessors, digitalsignal processors, microcontrollers or customised hardware.

FIG. 2 shows a flow chart of the steps that may be performed by thecomputational system 1 to compress a two-dimensional image in accordancewith the present invention. The image to be compressed is first receivedas data through the I/O interface 2 and stored in memory 5 (step 10).Optionally, the image may be subdivided into blocks, for example bysubdividing the image data into 8×8 matrices (step 11) prior to furtherprocessing. Where an image is not an integral number of 8×8 blocks, theimage can be “padded” with white pixels to expand it to an integral of8×8 blocks. Step 11 may be advantageous for spatial to frequencytransformations that are more efficient on smaller images, such as thediscrete cosine transform (DCT).

In the preferred embodiment of the invention, each block is thentransformed to the frequency domain (step 12). An 8×8 matrix of DCTcoefficients is produced for each block. The discrete cosine transform(DCT) may be used as the spatial to frequency transformation. For2-dimensional images, the 2-D DCT shown in equation (1) is used.

$\begin{matrix}{{{T\left\lbrack {i,j} \right\rbrack} = {{c\left( {i,j} \right)}{\sum\limits_{x = 0}^{N - 1}\;{\sum\limits_{y = 0}^{N - 1}{{V\left\lbrack {x,y} \right\rbrack} \times \cos\frac{\left( {{2y} + 1} \right)i\;\pi}{2N} \times \cos\frac{\left( {{2x} + 1} \right)i\;\pi}{2N}\text{Where:}{{c\left( {i,j} \right)} = \frac{2}{N}}}}}}},{{{i\mspace{14mu}{and}\mspace{14mu} j} \neq {0{c\left( {i,j} \right)}}} = \frac{1}{N}},{{i\mspace{14mu}{and}\mspace{14mu} j} = 0}} & {{equation}\mspace{14mu}(1)}\end{matrix}$

Other spatial to frequency transformations may be applied, including theFourier transform or wavelet transformation. In addition, othertransformations that convert the spatial domain data into a set ofcoefficients for a basis function may be used, for example using awavelet transformation. However, a transformation to the frequencydomain is preferred due to the way humans perceive visual information.Where other forms of input data, such as audio data is used, anappropriate transformation is used to transform that data to thefrequency domain.

By performing a spatial to frequency transformation such as the DCT,advantage is taken of the way the human eye perceives images. Theadvantage results from the tendency of the magnitude of the frequencycomponents to diminish in an image with higher frequency. As the higherfrequency components are less visible to the eye, they can be removedwith little noticeable effect on the image quality.

An example transformation of an 8×8 matrix is shown in FIG. 3. Matrix Sis the spatial domain matrix and matrix T is the corresponding frequencydomain matrix, specifying the coefficients resulting from thecomputation of the DCT. The spatial domain matrix may be reconstructedfrom the frequency domain matrix using the inverse discrete cosinetransform (IDCT) shown in equation (2).

$\begin{matrix}{{{V\left\lbrack {x,y} \right\rbrack} = {\sum\limits_{x = 0}^{N - 1}\;{\sum\limits_{y = 0}^{N - 1}{{c\left( {i,j} \right)}{T\left\lbrack {i,j} \right\rbrack}\cos\frac{\left( {{2y} + 1} \right)i\;\pi}{2N} \times \cos\frac{\left( {{2x} + 1} \right)j\;\pi}{2N}}}}}\text{Where:}{{{c\left( {i,j} \right)} = \frac{2}{N}},{{{i\mspace{14mu}{and}\mspace{14mu} j} \neq {0{c\left( {i,j} \right)}}} = \frac{1}{N}},{{i\mspace{14mu}{and}\mspace{14mu} j} = 0}}} & {{equation}\mspace{14mu}(2)}\end{matrix}$

The magnitude of the higher frequency components may be reduced bydividing the elements in the transformed matrix T by the elements in aquantization matrix (step 13). This process is similar to that which isperformed for the JPEG standard. To increase the compression, elementsin a quantization matrix may be multiplied by a single coefficientbefore dividing the transformed matrix (step 14). Having performed thistransformation, the top left component of the transformed matrix may betermed the “DC” coefficient or component and the remaining components ofthe transformed matrix the “AC” components.

In order to obtain a function that may be modelled by a learningprocessor, a sequence of values, or row R is required. To form this rowR, a sequence of values may be formed by following a zigzag patternthrough the matrix T (step 17). Using this pattern, which is shown inFIG. 4, the importance of each coefficient is proportional to itsposition in the sequence. The top left co-efficient of the matrix is the“DC” component and can be interpreted as setting the average colour ofthe block defined the matrix. The DC component for each matrix may beremoved from the row and recorded separately (step 15) and the datadefining the DC components may be compressed using entropy encoding(step 16). Lossless or lossy data compression may be used for the DCcomponents depending on requirements for the compression process.

In an alternative embodiment, the DC components may remain part of therow. However, this may reduce the quality of compression.

For the matrix T shown in FIG. 3, a row R formed by following theabovementioned zigzag pattern and with the DC component removed is: R1(partially shown)=[−81, −44, 10, 20, 45, −43, −4 . . . −2, 0, 5, 1]. Therow R and DC component for each matrix subdivided from the image data instep 11 is stored in the memory 5 of the computational system 1. If theDC components are encoded, the resultant encoded data may replace theactual DC components in the memory 5.

FIG. 5 shows a plot of the row R1 formed from the 63 components (the“DC” component omitted), formed by the DCT of a normalised matrix S,normalised to be between −1 and 1 from the original colour range of 0 to255. The independent variable x represents the position of the componentin the row R1. Resulting from the transformation to the frequencydomain, the magnitude of the DCT coefficients R1 (x) generally decreasefurther along the row. However, the sign (whether R1 (x) is positive ornegative) appears relatively random along the row, causing a large swingin the input data. FIG. 6 shows a plot of the absolute value of the rowR1. FIG. 6 shows a plot of a row R1′, which is the absolute value of therow R1. The plot has a smoother profile, which is better suited toestimation by a learning process. Therefore, in a preferred embodimentthe absolute value of each DCT coefficient in the row R is computed andstored to form R′ (step 18).

Although it is convenient to use the absolute value of the function sothat the values less than zero are inverted, the function may beinverted about another predetermined value as required. Alternativevalues from zero may be selected to obtain a smoother row R′.

If the image is to be compressed on the basis of an estimate of a row R′of absolute values, then to accurately reconstruct the image, thereshould be a way of identifying and recording which coefficients wereinverted (step 19). For discrete or digital information, a convenientmeans of reporting when a value in the function has been inverted is bya single bit. For example, a “1” may represent that the value has beeninverted and a “0” may represent that the value has not been inverted.This information is also recorded in a row I for use in reconstructingthe image when required. For the row R1, the corresponding row I1 wouldbe: I1 (partially shown)=[1, 1, 0, 0, 0, 1, 1 . . . 1, 0, 0, 0].

In an optional additional step to increase compression, any recoveredvalue of the DCT coefficient that has an absolute value less than apredefined error value may have its inversion bit set to zero (step 20).That is, for recovered values close to zero the inversion bit may bediscarded. In an alternative embodiment, the magnitude of the originalDCT coefficients, rather than the recovered DCT coefficients may beused. This may be preferred in some cases in order to reduce thecomputation time for the compression.

Once the row R1 has been finalised in steps 17 through 20, the learningmachine 4 is used to apply a learning process to the row R1 in order toprovide an estimate of row R1 using less parameters (step 21). Anysuitable learning process may be used, including:

-   -   a) neural networks, particularly radial basis function (RBF)        networks and multilayer perceptrons;    -   b) kernel based models;    -   c) wavelet networks;    -   d) support vector machines;    -   e) coordinate ascent based learning (Kernel AdaTron or        Successive Over-Relaxation);    -   f) Fourier or polynomial series modelling; and    -   g) quadratic programming (QP) or linear programming (LP) based        learning.

Those skilled in the relevant arts will appreciate that other suitablelearning processes exist or may be developed that may be used with thepresent invention.

Kernel AdaTron and support vector machines (SVM's) otherwise known askernel machines may be used as the learning machine 4. Each of thesetechniques is described by way of example herein below.

Initially developed for solving classification problems, SVM techniquescan be successfully applied in regression, i.e. for functionalapproximation problems, see for example U.S. Pat. No. 5,950,146(Vapnik), the contents of which are hereby incorporated herein in theirentirety. Unlike pattern recognition problems where the desired outputsy_(i) are discrete values, in data compression applications real valuedfunctions are dealt with.

The general regression learning problem is set as follows—the learningmachine is given/training data from which it attempts to learn theinput-output relationship (dependency, mapping or function) f(x). Atraining data set:D=[x(i),y(i)]εR″×R″,i=1, . . . ,l

consists of l pairs (x₁; y₁); (x₂; y₂);; (x_(l); y_(l)), where theinputs x are n-dimensional vectors xεR″ and system responses yεR″, arecontinuous values. The SVM considers approximating functions of the formshown in equation (3).

$\begin{matrix}{{f\left( {x,w} \right)} = {\sum\limits_{i = 1}^{N}\;{w_{i}{\phi_{i}(x)}}}} & (3)\end{matrix}$

where the functions φ_(i)(x) are called features, kernels or basisfunctions. Equation (3) is a SVM model where N is the number of supportvectors. In the case of SVM regression, one uses Vapnik's linear lossfunction, equation (4) with ε-insensitivity zone as a measure of theerror of approximation,

$\begin{matrix}{{{y - {f\left( {x,w} \right)}}} = \left\{ \begin{matrix}{{{{0\mspace{14mu}{if}\mspace{14mu}{{y - {f\left( {x,w} \right)}}}} \leq} \in}\mspace{76mu}} \\{{{{{y - {f\left( {x,w} \right)}}} -} \in},{{otherwise}.}}\end{matrix} \right.} & (4)\end{matrix}$

If the predicted value is within the ε-insensitivity zone the loss(error or cost) is zero. For all other predicted points outside theε-insensitivity zone, the loss equals the magnitude of the differencebetween the predicted value and the radius ε of the tube.

In solving regression problems SVM performs linear regression inn-dimensional feature space using the ε-insensitivity loss function. Atthe same time, it tries to reduce model capacity by minimizing ∥w∥², inorder to ensure better generalization. All these are achieved byminimizing equation (5):

$\begin{matrix}{\Re_{w,\xi,\xi^{*}} = {{\frac{1}{2}{w}^{2}} + {C\left( {{\sum\limits_{i = 1}^{l}\;\xi} + {\sum\limits_{i = 1}^{l}\;\xi^{*}}} \right)}}} & (5)\end{matrix}$under constraints:y _(i) −f(x,w)≦ε+ξf(x,y)−y _(i)≦ε+ξ*ξ≧0ξ*≧0

where ξ and ξ* are optional slack variables for measurements ‘above’ and‘below’ an ε-insensitivity zone respectively.

Both slack variables are positive values and they measure the deviationof the data from the prescribed ε-insensitivity zone. If used, theirmagnitude can be controlled by a penalty parameter C. This optimizationproblem is typically transformed into the dual problem, and its solutionis given by equation 6.

$\begin{matrix}{{{f\left( {x,y} \right)} = {\sum\limits_{i = 1}^{N_{sv}}\;{\left( {\alpha_{i}^{*} - \alpha_{i}} \right){G\left( {x_{i},x} \right)}}}}{0 \leq \alpha_{i}^{*} \leq C_{i}}{0 \leq \alpha_{i} \leq {C.}}} & (6)\end{matrix}$

Where α_(i) and α_(i)* are the Lagrange multipliers corresponding to ξand ξ*, N_(SV) is the number of support vectors and G(xi; x) is thekernel function. The constant C influences a trade-off between anapproximation error and the weight vector norm ∥w∥ and is a designparameter that is chosen by the user. An increase in C penalizes largererrors (large ξ and ξ*) and in this way leads to an approximation errordecrease. However, this can be achieved only by increasing the weightvector norm ∥w∥. At the same time, an increase in ∥w∥ does not guaranteea small generalization performance of a model. Another design parameterwhich is chosen by the user is the required precision embodied in an εvalue that defines the size of an ε-insensitivity zone.

There are a few learning parameters in constructing SVM's forregression. The two most relevant are the ε-insensitivity zone and thepenalty parameter C if used. Both parameters should be chosen by theuser. Increase in the ε-insensitivity zone means a reduction inrequirements on the accuracy of approximation. It decreases the numberof support vectors, leading to an increase in data compression. Adetailed mathematical description of SVMs is referred to in: (a) V. N.Vapnik, The Nature of Statistical Learning Theory, Springer, 1995; (b)V. N. Vapnik, Statistical Learning Theory, John Wiley & Sons, Inc.,1998; and (c) V. Kecman, Learning and Soft Computing: support vectormachines, neural networks and fuzzy logic models, The MIT Press, 2001.

FIG. 11 shows graphically the variables referred to above, in particularthe ε-insensitivity zone and slack variables ξ and ξ*.

Support vector machine learning typically uses quadratic programming tosolve the optimisation problem in equation (6). While this produces thedesired result, quadratic programming is computationally expensiveresulting in even a moderate sized image requiring a few minutes tocompress. Equation (6) may be transformed into a linear optimisationproblem which can be solved using Non Negative Least Squares (NNLS).NNLS which typically requires only a few seconds to solve compared witha quadratic programming solution requiring a few minutes.

In a second embodiment of the present invention, the basis (support)vectors may be identified using gradient ascent learning with clipping.This method is an alternative to the NNLS solution described above inrelation to support vector machine learning. Gradient ascent learningwith clipping belongs to the class of learning machines known as kernel.AdaTrons. The learning is preformed in dual space and defined as themaximization of the dual Lagrangian shown in equation (7).

$\begin{matrix}{{{L_{d}\left( {\alpha,\alpha^{*}} \right)} = {{{- ɛ}{\sum\limits_{i = 1}^{l}\;\left( {a_{i}^{*} + \alpha_{i}} \right)}} + {\sum\limits_{i = 1}^{l}\;{\left( {a_{i}^{*} - \alpha_{i}} \right)y_{i}}} - {\frac{1}{2}{\sum\limits_{i,{j = 1}}^{l}\;{\left( {a_{i}^{*} - \alpha_{i}} \right)\left( {a_{j}^{*} - \alpha_{j}} \right){K\left( {x_{i},x_{j}} \right)}}}}}},} & (7)\end{matrix}$

Where ε is a prescribed size of the insensitivity zone, and α_(i) andα_(i)*(i=1, I ) are Lagrange multipliers for the points above and belowthe regression function respectively.

Learning results in I Lagrange multiplier pairs (α, α*). α_(i)(andα_(i)*), which correspond to the slack variables ξ (and ξ*) for SVMlearning, will be nonzero values for training points on or ‘above’ (andon or ‘below’) an ε-insensitivity zone. Because no training data can beon both sides of the ε-insensitivity zone, either α_(i) or α_(i)* willbe nonzero. For data points inside the ε-insensitivity zone, bothmultipliers will be equal to zero. After learning, the number of nonzeroparameters α_(i) or α_(i)* is equal to the number of basis vectors.Because at least one element of each pair (α_(i), α_(i)*), i=1, I, iszero, the product of α_(i) and α_(i)* is always zero.

The dual Lagrangian L_(d) as given in equation (7) will be maximized bya series of iterative steps until the predefined accuracy conditions arenot satisfied.

Kernel AdaTron learning in solving regression task here is defined asthe gradient ascent update rules for α_(i) and α_(i)* shown in equations(8a) and (8b).

$\begin{matrix}{{{\Delta\;\alpha_{i}} = {{\eta_{i}\frac{\partial L_{d}}{\partial\alpha_{i}}} = {{\eta_{i}\left( {y_{i} - ɛ - {\sum\limits_{j = 1}^{l}\;{\left( {\alpha_{j} - \alpha_{j}^{*}} \right){K\left( {x_{j},x_{i}} \right)}}}} \right)} = {{\eta_{i}\left( {y_{i} + ɛ - f_{i}} \right)} = {- {\eta_{i}\left( {E_{i} + ɛ} \right)}}}}}},} & \left( {8a} \right) \\{{{\Delta\;\alpha_{i}^{*}} = {{\eta_{i}\frac{\partial L_{d}}{\partial\alpha_{i}^{*}}} = {{\eta_{i}\left( {{- y_{i}} - ɛ + {\sum\limits_{j = 1}^{l}\;{\left( {\alpha_{j} - \alpha_{j}^{*}} \right){K\left( {x_{j},x_{i}} \right)}}}} \right)} = {{\eta_{i}\left( {{- y_{i}} - ɛ + f_{i}} \right)} = {\eta_{i}\left( {E_{i} - ɛ} \right)}}}}},} & \left( {8b} \right)\end{matrix}$

Where y_(i) is the measured value for the input x_(i), ε is theprescribed insensitivity zone, and E_(i)=f_(i)−y_(i) stands for thedifference between the regression function f at the point x_(i) and thedesired target value y_(i) at this point.

Another version of the learning, accounting for geometry is provided inequation (9a). For the α_(i)* multipliers, the value of the gradient isshown in equation (9b) and the update value for α_(i) shown in equation(10a), with an alternative representation in equation (10b).

$\begin{matrix}{\frac{\partial L_{d}}{\partial\alpha_{i}} = {{{{- {K\left( {x_{i},x_{i}} \right)}}\alpha_{i}^{*}} + y_{i} - ɛ - f_{i}} = {- \left( {{{K\left( {x_{i},x_{i}} \right)}\alpha_{i}^{*}} + E_{i} + ɛ} \right)}}} & \left( {9a} \right) \\{\frac{\partial L_{d}}{\partial\alpha_{i}^{*}} = {{{- {K\left( {x_{i},x_{i}} \right)}}\alpha_{i}} + E_{i} - {ɛ.}}} & \left( {9b} \right) \\{{{\Delta\;\alpha_{i}} = {{\eta_{i}\frac{\partial L_{d}}{\partial\alpha_{i}}} = {- {\eta_{i}\left( {{{K\left( {x_{i},x_{i}} \right)}\alpha_{i}^{*}} + E_{i} + ɛ} \right)}}}},} & \left( {10a} \right) \\{\left. \alpha_{i}\leftarrow{\alpha_{i} + {\Delta\;\alpha_{i}}} \right. = {{\alpha_{i} + {\eta_{i}\frac{\partial L_{d}}{\partial\alpha_{i}}}} = {\alpha_{i} - {\eta_{i}\left( {{{K\left( {x_{i},x_{i}} \right)}\alpha_{i}^{*}} + E_{i} + ɛ} \right)}}}} & \left( {10b} \right)\end{matrix}$

If the optimal value for the learning rate η_(i)=1/K(x_(i),x_(i)) ischosen, the KA, i.e., the gradient ascent learning, is defined inequation (11a) and the update rule for α_(i)* is defined in equation(11b).

$\begin{matrix}\left. \alpha_{i}\leftarrow{\alpha_{i} - \alpha_{i}^{*} - \frac{E_{i} + ɛ}{K\left( {x_{i},x_{i}} \right)}} \right. & \left( {11a} \right) \\\left. \alpha_{i}^{*}\leftarrow{\alpha_{i}^{*} - \alpha_{i} + \frac{E_{i} - ɛ}{K\left( {x_{i},x_{i}} \right)}} \right. & \left( {11b} \right)\end{matrix}$

After each iteration step, the dual variables α_(i) and α_(i)* areclipped between zero and C, (0<α_(i)<C, 0<α_(i)*<C), according toequations (12a) and (12b).α_(i)←min(max(0,α_(i)),C) (i=1,I),   (12a)α_(i)*←min(max(0,α_(i)*),C) (i=1,I).   (12b)

The learning can also be performed by using the successiveover-relaxation algorithm with clipping accounting in this way theconstraints related to equation (7). Other techniques in searching forrelevant points (basis or support vectors) may also be used (such asnonnegative conjugate-gradient). The particular learning processselected to be used with the present invention may vary depending on theparticular processing requirements and personal preference.

By way of example, a learning machine trained on the first sixteencoefficients of R1′ plotted in FIG. 6 with an error set to 10 andGaussian width set to 1, selects the basis vectors and associatedweights shown in Table 1.

TABLE 1 Basis 1 5 6 9 12 14 15 Vectors Weights 71.29 23.44 18.29 0.601.95 5.91 21.97

The outputs of the learning process, which are in the form of basisvectors and their weights, are recorded (step 24). One set of basisvectors, their weights, the DC components and the inversion bits arestored for each block of the original image and collectively thesedefine the estimation of the original image that has resulted from thecompression process of the present invention.

In order to reconstruct the image from the compressed data, the abovedescribed process is reversed. The basis vectors and inversion data areused to determine the DCT coefficients and then the IDCT is applied toreach the image.

FIG. 7 shows the resulting estimate of the values of the DCTcoefficients of the entire row R1′ plotted in FIG. 6 using a learningmachine training process. The input points to the learning machine arethe same as the DCT coefficients defining the function in FIG. 5 and arerepresented by a cross (+) and the resulting basis vectors by a circle(◯). With an error set to +/−0.1, 10 basis vectors are required. Thiscan be loosely interpreted as a compression ratio of 63:10 or 6.3:1.FIG. 8 shows the error of the estimation, i.e. the difference betweenthe plot of FIG. 1A and the plot of FIG. 2A, with the total accumulatederror being 271.

By way of comparison, FIG. 9 shows the resulting estimate of the valuesof the DCT coefficients of the entire row R1 plotted in FIG. 5 using thea learning machine training process with the same error. In this case 34basis vectors are required. This can be loosely interpreted as acompression ratio of 63:34 or approximately 1.85:1, a decrease incompression by a factor of around 3.5. FIG. 10 shows the error of theestimation, the total accumulated error being 452.

Although the process of inverting portions of a function about apredetermined value prior to applying a learning process to the functionhas been described herein above as part of an overall process tocompress image information, those skilled in the relevant arts willappreciate that the inversion process may be used for the estimation ofany function using a learning process. The function may define anyinformation from which an estimate needs to be formed using a learningprocess, including functions formed from data in the spatial, time orfrequency domains.

The extent of compression may be controlled by setting a number ofvariables in the above described algorithms. For example, the size ofthe ε-insensitivity zone, the quantization step size for both the AC andDC components, whether or not the DC components are included with the ACcomponents or not, whether selected inversion bits are discarded are allconfigurable variables that allow the compression to be increased ordecreased. Establishing a required combination of these configurablevariables and any other variables that may be used is considered withinthe capability of the ordinarily skilled person in the relevant arts andtherefore will not be detailed herein.

In addition to the above configuration, advantage may be taken of thetendency of the magnitude of the frequency components to diminish in animage with higher frequency and that the higher frequency components areless visible to the eye to further reduce the size, of the compressedimage data. As can be seen from FIGS. 5 and 7, the largest magnitude DCTcoefficients are generally the first sixteen coefficients. Therefore, anadditional configuration step may be to discard the seventeenth tosixty-third coefficients (step 17A). As shown in Table 1 above, thisreduces the number of basis vectors from ten to seven, leading to acompression ratio of 63:7 or 9:1. The more coefficients discarded, thehigher the compression, with a penalty in recovered image quality.

In order to be useful, the data resulting from the compression will needto be communicated or stored. Where a weight has no corresponding basisvector, the value of that weight is set to zero (step 22). Therefore,taking the first 16 coefficients of the row R1′ and the associated basisvectors and weights in Table 1, then the resulting data defining thebasis vectors and weights (C1), represented in decimal numbers forclarity would be:C1=[71.29, 0, 0, 0, 23.44, 18.29, 0, 0, 0.60, 0, 0, 1.95, 0, 5.91,21.97]

The weights may then be quantized (step 23). Quantization may beperformed by identifying the maximum and minimum weight values acrossall blocks for the image and dividing these into a predefined number ofquantization steps. For example, if the minimum weight is 0 and themaximum is 90 and five quantization levels are required, then theweights can only take the values {9, 27, 45, 63, 81}. The minimum stepsize, in this case 9 may be subtracted from each weight afterquantization.

The weights may be combined with the basis vectors and matched with theDCT coefficients. This involves adding a constant to each of the basisvector weights (step 24), the constant selected to make all basis vectorweights positive and non-zero. The sum of the constant and the basisvector weights may be termed the shifted basis vectors. The shiftedbasis vectors are then inverted or not inverted depending on theinversion bit (step 25). For example, if the inversion bit is a “1”, thebasis vector weight may be inverted to the absolute value, but negative.Therefore, a series of positive and negative numbers created in this waytogether with information defining the selected scalar defines the basisvectors, their weights and the inversion bits, as the position in thedata stream indicates the basis vector (centre of the Gaussian), theabsolute value less the constant defines the weight of the basis vectorand the sign of the number in the series defines the inversion bit.

For example taking C1 referred to herein above, each coefficient takesthe closest value to the quantization steps 9, 27, 45, 63, 81 If any ofthe quantization levels are negative, then a positive number of equalmagnitude must be added to all weights to ensure that they are positive.After combining the result with the inversion bits, C2 results.C2=[−81, 9, 9, 9, −27, −27, 9, −9, 9, 9, −9, −9, 9, −9, 27]

To increase compression, where the inversion bit does not correspond toa weight (all the values 9 or −9 in C2), the inversion bit may bediscarded (step 26). The discarding of the inversion bit means that thepositive value of the weight is no longer necessary and so the weightsare reset to zero, resulting in C3. This introduces a small error whenthe image is decompressed, but increases compression.C3=[−81, 9, 9, 9, −27, −27, 9, 9, 9, 9, 9, 9, 9, 9, 27]

Following step 20 and/or step 26 results in a large number of zeros inthe data. Therefore, the weights may be run length encoded and/orentropy encoded (step 27) to further reduce the size of the data to bestored/transmitted. This data is recorded (step 28). If the DCcomponents were removed from the matrix of each block and entropyencoded, the combination of the entropy encoded DC components, theencoded C3, the minimum quantization level and scalar added to C2represents the compressed image data.

In FIG. 2, the primary steps of the compression process, steps 10, 12,17, 21 and 28 are indicated down the left hand side, with solid arrowsjoining the steps. The other steps indicated by dashed arrows representoptional features of the compression process of the present inventionand the various paths that may be followed each represent a possibleimplementation of the invention. Those skilled in the relevant arts willappreciate that other optional paths through the flowchart in FIG. 2 andother steps not shown in FIG. 2 may be added without departing from thepresent invention.

Results

To objectively measure image quality, the signal to noise ratio (SNR) isused. The SNR is calculated using equation (7).

$\begin{matrix}{{SNR} = {20\mspace{14mu}{\log\left\lbrack {\frac{1}{{width} \times {height}} \times \frac{\underset{\;}{\overset{\;}{\sum_{ij}}}\;{Original\_ Image}_{ij}}{\underset{\;}{\overset{\;}{\sum_{ij}}}\left( {{Original\_ Image}_{ij} - {Recovered\_ Image}_{ij}} \right)}} \right\rbrack}}} & {{equation}\mspace{14mu}(7)}\end{matrix}$

Results using the process described above and illustrated in FIG. 2against the benchmark 512×512 ‘Lena’ image are shown in FIG. 12 incomparison with the baseline JPEG algorithm. In FIG. 12, the SNR for theJPEG algorithm is referenced by line crosses (+) and the compressionalgorithm of the present invention by circles (◯). The JPEG algorithmperforms better than the algorithm of the present invention forcompression ratios up to 22:1 (on this particular image). Forcompression ratios beyond this, the algorithm of the present inventionproduces higher quality images for the same compression ratio.

The baseline JPEG algorithm could not compress the 512×512 Lena imagegreater than 64:1. The algorithm of the present invention achieved acompression ratio of 192:1 and still achieved better image quality thanthe image compressed using JPEG at 64:1.

The decompressed images are shown in FIG. 13 for subjective comparisonof the algorithm of the present invention compared with the baselineJPEG algorithm. The original image was 512×512 pixels. The imagesresulting from decompression of compressed data formed according to thepresent invention at various compression ratios are referenced X1 to X4and those resulting from decompression of JPEG data JPG1 and JPG2.

Where in the foregoing description reference has been made to specificcomponents or integers of the invention having known equivalents thensuch equivalents are herein incorporated as if individually set forth.

Although this invention has been described by way of example and withreference to possible embodiments thereof, it is to be understood thatmodifications or improvements may be made thereto without departing fromthe scope of the invention as defined in the appended claims.

1. A method of producing compressed data including the steps of: a)receiving input data to be compressed; b) transforming said input datainto a sum of basis functions multiplied by a corresponding set ofcoefficients; c) forming a function from said set of coefficients; d)estimating said function using a learning process and recording theresulting estimate; wherein said estimate defines the compressed data,and includes a selection of said basis functions, said selection beingdetermined by the learning process to meet a required accuracy ofapproximation, whereby the required accuracy of approximation is definedby an insensitivity zone within which the estimate is permitted todeviate.
 2. The method of claim 1, further including subdividing inputdata received in step a) into a number of portions and performing stepsb) through d) on each of said portions.
 3. The method of claim 2,wherein the step of forming a function from said set of coefficientsincludes first removing a DC component from each said portion andrecording the DC components as part of the compressed data.
 4. Themethod of claim 3, further including encoding said DC components priorto recording them as part of the compressed data.
 5. The method of claim1 wherein said step of forming a function includes: i) defining a firstfunction as a series of said coefficients alternating about apredetermined value; ii) creating a second function by invertingidentified portions of the first function on one side of thepredetermined value and recording inversion information defining theportions that have been inverted; wherein said second function definesthe function to be estimated in step d).
 6. The method of claim 5,wherein the learning process identifies a series of basis vectors andwherein the method further includes associating each basis vector with acorresponding coefficient in said function.
 7. The method of claim 6further including discarding the inversion information for anycoefficient that does not have an associated basis vector.
 8. The methodof claim 7, further including discarding weights calculated by thelearning process that do not have an associated basis vector.
 9. Themethod of claim 6 further including discarding the inversion informationfor any coefficient where at least one of its absolute value andreconstructed value is less than a predetermined magnitude.
 10. Themethod of claim 6 including storing the basis vectors as a data series,wherein the position of a value in the data series defines the basisvector and the value defines the basis vector's weight.
 11. The methodof claim 10, including shifting the values in the data series so thatthey have the same sign and are non-zero, recording the value by whichthe values have been shifted as part of the compressed data andcombining the basis vectors with the inversion information by changingthe sign of the values dependent on the inversion information.
 12. Themethod of claim 10 including encoding the data series using entropyencoding.
 13. The method of claim 1, wherein the step of transformingsaid input data into a sum of basis functions multiplied by acorresponding set of coefficients includes applying the discrete cosinetransform to the input data.
 14. The method of claim 1, wherein theinput data defines a two-dimensional.
 15. A method of creatinginformation defining an estimate of a function, the method including: i)receiving input information including a number of values defining afirst function, wherein the first function is to be estimated; ii)creating a second function by inverting identified portions of the firstfunction on one side of a predetermined value and recording inversioninformation defining the portions that have been inverted; iii)estimating said second function using a learning process and recordinginformation defining a resulting estimate; wherein the resultingestimate and inversion information are collectively the informationdefining an estimate of the first function, and the estimate includes aselection of said values, said selection being determined by thelearning process to meet a required accuracy of approximation, wherebythe required accuracy of approximation is defined by an insensitivityzone within which the estimate is permitted to deviate.
 16. The methodof claim 15, wherein the learning process computes a series of basisvectors and the method includes associating each basis vector with acorresponding value in said second function.
 17. The method of claim 16further including discarding the inversion information for any valuethat does not have an associated basis vector.
 18. The method of claim17, further including discarding weights computed by the learningmachine training that do not have an associated basis vector.
 19. Themethod of claim 16 further including discarding the inversioninformation for any coefficient where one or both of its absolute valueand reconstructed is less than a predetermined magnitude.
 20. The methodof claims 16 including storing the basis vectors as a data series,wherein the position of a value in the data series defines the basisvector and the value defines the basis vector's weight.
 21. The methodof claim 20, including shifting the values in the data series so thatall the values have the same sign and are non-zero, recording the valueby which the values have been shifted as part of the informationdefining an estimate of the first function and combining the basisvectors with the inversion information by changing the sign of thevalues dependent on the inversion information.
 22. The method of claim20 including encoding the data series using entropy encoding.
 23. Acomputerized apparatus for producing compressed data, the computerizedapparatus including receiving means to receive input data to becompressed and processing means operable to: a) transform said inputdata into a sum of basis functions multiplied by a corresponding set ofcoefficients; b) form a function from said set of coefficients; c)estimate said function using a learning process; wherein said estimatedefines the compressed data, and includes a selection of said basisfunctions, said selection being determined by the learning process tomeet a required accuracy of approximation, whereby the required accuracyof approximation is defined by an insensitivity zone within which theestimate is permitted to deviate.
 24. The computerised apparatus ofclaim 23 wherein the processing means is operable to form a function by:i) defining a first function as a series of said coefficientsalternating about a predetermined value; ii) creating a second functionby inverting identified portions of the first function on one side ofthe predetermined value and recording inversion information defining theportions that have been inverted; wherein said second function definesthe function to be estimated in step c).
 25. The computerised apparatusof claim 23 when used to compress image or sound information, whereinthe processing means is operable to transform coefficients.
 26. Ancomputerized apparatus for creating information defining an estimate ofa function, the computerized apparatus including receiving means forreceiving input information including a number of values defining afirst function, wherein the first function is to be estimated andprocessing means operable to: i) create a second function by invertingidentified portions of the first function on one side of a predeterminedvalue and recording inversion information defining the portions thathave been inverted; ii) estimate said second function using a learningprocess and recording information defining a resulting estimate; whereinthe resulting estimate and inversion information are collectively theinformation defining an estimate of the first function, and the estimateincludes a selection of said values, said selection being determined bythe learning process to meet a required accuracy of approximation,whereby the required accuracy of approximation is defined by aninsensitivity zone within which the estimate is permitted to deviate.27. A computer readable medium containing a software product forproducing compressed data, the software product including an instructionset including instructions to cause a processing means or a combinationof processing means to: a) transform said input data into a sum of basisfunctions multiplied by a corresponding set of coefficients; b) form afunction from said set of coefficients; c) estimate said function usinga learning process; wherein the estimate defines the compressed data,and includes a selection of said basis functions, said selection beingdetermined by the learning process to meet a required accuracy ofapproximation, whereby the required accuracy of approximation is definedby an insensitivity zone within which the estimate is permitted todeviate.
 28. A computer readable medium containing a software productfor determining an estimate of a function, the software productincluding an instruction set including instructions to cause aprocessing means or a combination of processing means to: i) receiveinformation defining a first function, wherein the first function is tobe estimated; ii) create a second function by inverting identifiedportions of the first function on one side of a predetermined value andrecording inversion information defining the portions that have beeninverted; iii) estimate said second function using a set of parametersthan number less than the number of values in said input information andrecording information defining the resulting estimate; wherein theresulting estimate and inversion information are collectively theinformation defining an estimate of the first function, and the estimateincludes a selection of said values, said selection being determined bythe learning process to meet a required accuracy of approximation,whereby the required accuracy of approximation is defined by aninsensitivity zone within which the estimate is permitted to deviate.29. A method of compressing data comprising the steps of: a) receivinginput data and subdividing it into a plurality of portions; and for eachportion: b) transforming said input data into a sum of basis functionsmultiplied by a corresponding set of coefficients; c) removing DCcomponents and recording one of the removed DC components and acompressed form of the DC components; d) defining a first function as aseries of said coefficients alternating about a predetermined value; e)creating a second function by inverting identified portions of the firstfunction on one side of the predetermined value and recording inversioninformation defining the portions that have been inverted; f) estimatingsaid second function using a learning process and recording theresulting estimate; wherein said estimate defines the compressed dataand recorded DC components, and includes a selection of said basisfunctions, said selection being determined by the learning process tomeet a required accuracy of approximation, whereby the required accuracyof approximation is defined by an insensitivity zone within which theestimate is permitted to deviate.
 30. A method of producing compresseddata comprising the steps of: a) receiving input data to be compressed;b) transforming said input data into a sum of basis functions multipliedby a corresponding set of coefficients; c) forming a function from saidset of coefficients; d) estimating said function using a learningprocess involving associating a series of weighted basis vectors withsaid function and recording the resulting estimate; wherein the estimateformed in step d) defines the compressed data, and includes a selectionof said basis functions, said selection being determined by the learningprocess to meet a required accuracy of approximation, whereby therequired accuracy of approximation is defined by an insensitivity zonewithin which the estimate is permitted to deviate.
 31. The method ofclaim 30, including storing the basis vectors as a data series andencoding the data series using entropy encoding.